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ABSTRACT 

Centrality is an important notion in network analysis and is 
used to measure the degree to which network structure con- 
tributes to the importance of a node in a network. While 
many different centrality measures exist, most of them apply 
to static networks. Most networks, on the other hand, are 
dynamic in nature, evolving over time through the addition 
or deletion of nodes and edges. A popular approach to an- 
alyzing such networks represents them by a static network 
that aggregates all edges observed over some time period. 
This approach, however, under or overestimates centrality 
of some nodes. We address this problem by introducing a 
novel centrality metric for dynamic network analysis. This 
metric exploits an intuition that in order for one node in a 
dynamic network to influence another over some period of 
time, there must exist a path that connects the source and 
destination nodes through intermediaries at different times. 
We demonstrate on an example network that the proposed 
metric leads to a very different ranking than analysis of an 
equivalent static network. We use dynamic centrality to 
study a dynamic citations network and contrast results to 
those reached by static network analysis. 
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1. INTRODUCTION 

The structure of many complex systems, from biological and 
social systems to the World Wide Web and more recently 
the Social Web, can be represented as a network. Ability to 
analyze networks in order to identify important nodes and 
discover hidden structure has led to important scientific and 
technological breakthroughs. As a single profound exam- 
ple, PageRank 23 algorithm, which ranks Web documents 
by analyzing the structure of hyperlinks between them, has 
revolutionized both Internet search and commerce. Network 
analysis algorithms are also used to discover communities of 
like-minded individuals 20 , identify influential people [17] 



and blogs [18^, r ank scientists 24 and find important scien- 
tific papers~[29l [g) [26] . With few exceptions, these metrics 
and algorithms have been applied to static networks. Real- 
world networks, however, are dynamic in nature, because 
their topology can change over time with addition of new 
nodes and edges or removal of existing ones. 

This paper defines a novel centrality metric for dynamic net- 
works. The metric generalizes the path-based centrality used 
in network analysis 4, 12 which measures centrality of a 
node by the number of paths, of any length, that connect 
it to other nodes. The dynamic centrality metric exploits 
an intuition that in order for a message sent by one node 
in a network to reach another after some period of time, 
there must exist a path that connects the source and des- 
tination nodes through intermediaries at different times. A 
distinctive feature of this metric is that it is parameterized 
by factors that set both time and length scale of interactions. 
These parameters can in some cases be estimated from data. 
We use dynamic centrality to rank nodes by the number of 
time-dependent paths that connect them to other nodes in 
the network. In addition to discovering best connected, or 
influential, nodes, the method can identify nodes that are 
most connected to a specific node and, therefore, have high- 
est influence on it. We perform detailed analysis of a toy 
dynamic network and show that dynamic network analysis 
can lead to a vastly different ranking than analysis of an 
equivalent static network. We also study a real-world dy- 
namic network that represents scientific citations data set. 
We find optimal parameters for the metric by fitting it to 
the citation chains' temporal and length distributions. We 
show that dynamic centrality can produce a radically dif- 
ferent view of what the important nodes in the network are 
than static measures and leads to new insights about the 
structure of the dynamic network. 

In Section [2] we review existing research on network analysis 
and identify challenges in extending it to dynamic networks. 
We define dynamic centrality in Section [3] and present math- 
ematical formalism that allows us to compute it from the 
snapshots of the network over time. We demonstrate in 
Section [3.31 how this metric can be used to rank nodes in a 
dynamic network. In Sec. [2] we apply dynamic centrality to 
study the scientific papers citations network and show that 
dynamic centrality can lead to a drastically different view of 
importance than analysis performed on an equivalent static 
network. 
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2. BACKGROUND AND RELATED WORK 

Centrality metrics: Centrality determines node's impor- 
tance in a network. This measure is dependent on the net- 
work structure. The simplest centrahty metric, degree cen- 
trahty, measures the number of edges that connect a node 
to other nodes in a network. Over the years many more 
complex centrality metrics have been proposed and studied, 
including Katz status score [Tg], a-centrality^, between- 
ness centrality ^lOj, and several variants based on random 
walk [27[ |21[ |19| , the most famous of which is PageRank (23] . 
The path-based centrality metrics 4 measure the extent to 
which a node can influence, or control how much information 
flows to, other nodes in a network. 

Consider, specifically, a-centrality defined by Bonacich 4 , 
which measures the total number of attenuated paths of any 
length between nodes i and j. Let A be the adjacency matrix 
of a network, such that Aij = 1 if an edge exists from i to j 
and Aij = otherwise, a- centrality matrix is given by: 

(a, I3) = PA + l3aA • A + • • • + • • • (1) 

where /3 is the attenuation factor along a direct edge (from 
the originating node) in a path, and a is the attenuation 
factor along an indirect edge (from any intermediate node) 
in a path. Although attenuation factors along subsequent 
edges in a path could in principle be different, for simplicity, 
we take them all to be the same, namely a. The first term in 
the equation above gives the number of paths of length one 
(edges) from nodes i to j, the second the number of paths 
of length two, and so on. 

The tunable parameter a sets the length scale of interac- 
tions. For a = 0, a-centrality takes into account direct 
edges only and reduces to degree centrality (weighted by /3). 
As a increases, C^{a,/3) becomes a more global measure, 
taking into account more distant interactions. Nodes can 
be ranked according to the number of paths that connect 
them to other nodes. In previous work 11, 12 we used this 
framework to identify both locally and globally influential 
nodes, as well as discover community structure of networks. 

Dynamic networks: While most of network analysis re- 
search focused on static networks, recently researchers began 
to study dynamic networks, whose topology changes in time 
through the addition or removal of nodes and edges. 5 rep- 
resented a dynamic network by time series, or snapshots, 
of the network, each of which aggregates links over a time 
scale much shorter than the entire observation period. They 
studied how degree centrality evolves in a dynamic network. 
|[9j observed that activation of links in a dynamic network 
creates a flow of information that leads to coherent clusters. 
They introduced a metric to study these structures and their 
evolution. The metric modifies the traditional clustering 
coefficient. Specifically, it measures the number of trian- 
gles in which a node of degree v participates. Similarly, 3 
proposed a formal framework for identifying communities 
within dynamic networks based on the temporal structure 
of underlying interactions. Our focus in this paper is not 
to identify coherent structures or groups in a dynamic net- 
work. Instead, we want to define an intuitive metric that 
enables us to rank nodes in a network. We generalize a- 
centrality to dynamic networks. Using this metric we can 
rank nodes by how well they are connected to other nodes 



in the network through time, thereby identifying important 
or influential nodes. 

Time- aware ranking: Closely related to dynamic net- 
work analysis is the problem of time-aware ranking of Web 
pages in information retrieval. This research is motivated 
by the observation 1 that PageRank's Web ranking algo- 
rithm is biased against newer pages, which may not have 
had enough time to accumulate links to give it a high rank. 
Several methods have been proposed to address the recency 
bias in PageRank, including |1[ |3Q| |2[ |8]. In general terms, 
these methods weigh edges in the network by age, with newer 
edges contributing more heavily to a page's importance. Our 
motivation is different. Rather than focus on improving 
the rank of newer nodes, we focus instead on defining a 
time-aware centrality metric that takes the temporal order 
of edges into account. 

Authors of 22 considered the temporal order of edges in 
the flow of information on a network. They proposed Even- 
tRank algorithm, a modification of PageRank, that takes 
into account a temporal sequence of events, e.g., spread of 
an email message, in order to calculate importance of nodes 
in a network. This approach takes into account the effect 
of the dynamic process on ranking. In contrast, we con- 
sider the effect of the dynamic network topology on ranking. 
These approaches are somewhat related: our method can be 
said to estimate the expected value of all temporal sequences 
taking place on the network. 

Scientific citations: Ranking scientific publications is 
an interesting application for dynamic network analysis. A 
long line of bibliometrics research attempted to define ob- 
jective metrics for identifying important scientific papers, 
researchers, publication venues, and institutions. The now- 
accepted measures for evaluating the impact of papers and 
individual researchers include citations count and h-index ^3 
A breakthrough in this field came with the representation of 
the body of scientific literature as a multi-partite network 
consisting of authors, papers, and publication venues, where 
a link between an author and a paper denotes a researcher's 
authorship of the paper, a link between two papers indi- 
cates a scientific citation, etc. This representation allows 
the structure of the network to be considered in ranking 
papers and authors. 

Scientific papers citations data set can also be considered a 
dynamic network, in which newly published papers create 
edges to existing papers by citing them. Unlike a generic 
dynamic network, however, edges in a citations network are 
never destroyed. All previous work treated a citations net- 
work essentially as a static network that aggregates all cita- 
tions links created over some time period. ^ implemented 
PageRank algorithm on such an aggregated network to find 
most influential papers. [24] divided the entire data pe- 
riod into homogeneous intervals containing equal numbers 
of citations and applied a PageRank-like algorithm to rank 
papers and authors within each time slice, thereby, enabling 
them to study how an author's influence changes in time. 
In order to address PageRank's bias for older papers, [29] 
introduced CiteRank, a modified version of PageRank, that 
explicitly takes paper's age into account. CiteRank per- 
forms a random walk on a citations graph, but initiates the 



walk from a recent paper i chosen randomly with probabil- 
ity Pi = e ^ , where agei is the age of the paper and r 
characteristic decay time. The random walk, however, was 
performed on an aggregated network. Authors estimated 
parameters of the random walk by fitting papers' CiteRank 
score to the number of citations accrued by them over some 
time period. |26j described FutureRank, an algorithm that 
predicts paper's PageRank scores some time in the future. 
FutureRank implicitly takes time into account by partition- 
ing data in time, and using data in one period to predict 
paper's ranking in the next. Similar to [24] 's approach, Fu- 
tureRank combines influence rankings computed on the pa- 
pers and authors networks into a single score. This score 
is shown to correlate well with the paper's PageRank score 
computed on citations links that will appear in the future. 
However, no previous method took the temporal order of 
citations edges into account. The method proposed in this 
paper, on the other hand, ranks scientific publications by 
explicitly taking temporal constraints on citations links into 
account. 

3. DYNAMIC CENTRALITY METRIC 

A dynamic network as a network whose topology changes 
over time through addition or removal of edges. Let t be 
the smallest time interval in which there is no change in 
the topology of the network. Following 5 , we represent 
network at time ti(i G 1, • • * by a graph . = {Vt- , Et-) 
with Vt- nodes and Et- edges between them at time ti. We 
define A{ti) as the adjacency matrix corresponding to Ct- . 
A time series of network snapshots Gti , Gt2j • • • Gt^ (where 
ti — ti-i < t) could then be used to represent a dynamic 
network over the time period Ai,n = {ti . . . tn}. 
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Figure 1: Example network, (a) Snapshots of the 
network showing only connected nodes at times 
ti,t2,t3 and t4. (b) A static network that aggregates 
different snapshots into a single network. 

Figure [T]^ a) shows four snapshots of a hypothetical dynamic 
network, with only connected nodes displayed. Note that 
edges are directed. A common method to analyze such a 
dynamic network is to create a static network that aggre- 
gates edges observed at all times. Such aggregate network 
is shown in Fig. [T]^b). However, aggregating over all edges 
loses important temporal information that can help eluci- 
date the structure of a dynamic network 5 . Consider how 
information spreads on a dynamic network. Node i will only 
be able to send a message to node j at time tk if and only if 
there exists an edge between i and j and that time. Specifi- 
cally, consider how a message sent by node 1 may reach node 
5. In the static network, there are three acyclic paths from 
1 to 5: 1^2^4^5, 1^2^3^4^5, and 1^2^3^5. Not 
all these paths are physically realizable, however. If a node 
does not retain a message but transmits it in the next time 
step, the only meaningful path is 1^2^3^4^5. Using this 
intuition, we define a novel centrality metric for dynamic 



networks that computes the number of paths between nodes 
i to j that exist over a period of time. 

3.1 Memory less Formulation 

We assume that the future state of the network Gt^j^^ de- 
pends only on its current state Gt^^ and none of its past 
states. This implies that each node propagates information 
it receives in the current time step at the very next time step. 
We model information spread on a network as a memoryless 
dynamic process: 

• with probability ^ a node initiates transmission of 
information by sending a message to its neighbors at 
time tk 

• with probability Oik^^ ^ ^ node sends the message it 
received at time tk to its neighbors at time tfc+i 

Although in principle, the attenuation factors a and {3 can 
change with time and distance from the source, which can be 
easily modeled in this framework, for simplicity we assume 
that all — a and = (3. The expected amount of 
information sent by node i at time ti that reaches node j at 
time tn via a sequence of intermediate nodes is given by the 
{i^jYs element of the dynamic centrality matrix: 



l3A{ti) + (3aA{ti)A{t2) + ■ 
+pa"-^A{h)---A{tn). 



(2) 



Let Ai,n be the time interval {ti, . . . ,tn} that information 
propagates from any node i at time tk to any node j at 
time tn, 1 ^ k < n. The cumulative expected amount of 
information reaching node j from node i in a given time 
interval Ai,n is given by the i, j's element of the cumulative 
dynamic centrality matrix: 



C"(/3,a,Ai, 



(3) 



3.2 Formulation with Memory 

In many dynamic networks, the future state of the network 
Ct^^-L may depend not only on its current state, but also on 
(possibly all) its past states Ct- (z < k). In a social network, 
for example, two individuals will remember an interaction 
they had, even if it happened a long time ago. Since in most 
situations more recent interactions are more important, we 
model this by introducing memory decay characterized by 
the retention probability 7 (0 < 7 < 1) and retention length 
m (m G 1, • • • , n). We model this as dynamic process with 
the following properties: 

• with probability /3 a node initiates transmission of in- 
formation by sending a message to its neighbors at 
time tk 

• with probability a a node passes the message it re- 
ceived at time tk to its neighbors at time t^+i- 

• with probability 7 a node retains the message it re- 
ceived at time tk until time t^+i. 

The retained adjacency matrix R{tn) at time tn depends on 
adjacency matrices at the previous times: 

^ A{tn) + lA{tn-i) • • • + 7""'^(ti), if n < m 

R(tnn) = { A{tn) + lA{tn-l) + " " ' 
^,m — 1 



+ ^^-'A{tn- 



m+1) 



otherwise 



Following Section [3?7] the retained dynamic centrality matrix 
can then be given as: 

RCf^^tJl3,a,j) = l3R{ti,j)+PaR{ti,j)R{t2n) 

+ /3a'i?(ti,7)i?(t2,7)i?(t3,7) + -" 

+ /3a"-^i?(ti,7)---i?(tn,7) (4) 

and the retained cumulative dynamic centrality matrix over 
the time interval Ai,n as: 

n 

a, 7, Ai,n) = ^ RCi^tr. a, 7) (5) 
k=i 
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Figure 2: Dynamic vs static centrality scores for 
nodes in the dynamic network shown in Fig. [l} (a) 
Total dynamic centrality scores for different val- 
ues of 7 over time period Ai,4. (b) Total static 
centrality scores for cumulative networks over time 
periods Ai ^, Ai s, and Ai,4. Lines correspond to 
a = 0.0,0.2, 0.4,0.6,0.8 and 1.0 respectively, from the 
bottom. 

3.3 Ranking in Dynamic Networks 

A basic problem in network analysis is ranking nodes to 
identify important or influential ones. We use dynamic cen- 
trality metric to rank nodes in a dynamic network. The 
intuition behind the ranking scheme is based on the diffu- 
sion of information on a network. Suppose that node i sends 
a unit of information at time ti. The expected amount of 
information reaching node j from i over a time interval Ai,n 
is given by RCfj{a, 7, Ai, n)r The total amount of informa- 
tion sent by i that reaches aTl other nodes in the network is 
measured by the dynamic centrality of i: 

Da {a, 7, Ai,n) = J2 ^^^J ^' ^1'-)- 
3 

This metric measures how connected node i is to other nodes 
in the network over some period of time Ai,n- Ranking 
nodes by how well connected they are allows us to identify 
the most influential nodes in a dynamic network over a pe- 
riod of time. 

Dynamic programming can be used to efficiently compute 
dynamic centrality. As can be seen in algorithm [l] in each 
iteration, n depends only on n-i and R{tn-i^^). Since the 
network at time ti{i G 1, • • • , n) is given by graph Gti — 

^ Since /3 factors out of the equations, without loss of gener- 
ality we set 13 — 1. 



(Vt-,Et-), in the naive implementation of this algorithm, 
taking \E\ — | Ui Et-\ and \V\ — | VtJ, each iteration 
has a runtime complexity of 0(|£^|) and space complexity 
of 0(|y| + 1^1). Assuming that the main memory is just 
large enough to hold n, n-i and 7, Ai,n), the i/o 

cost for each iteration is 0(|^|). If main memory is large 
enough to hold only n , and assuming efficient data structure 
such as a sorted link list is used to store R(tn-i,j), i/o cost 
is 0(|y| + 1^1). Since this formulation of dynamic central- 
ity is very similar to that of PageRank ^3], similar block 
based strategies can be used to further improve speed and 
efficiency of computing dynamic centrality [Ts] [15]. Like 
PageRank, dynamic centrality can be implemented using 
the map-reduce paradigm 7 , guaranteeing the scalability 
of this algorithm and its applicability to very large datasets. 



Algorithm 1 Dynamic centrality 
Input 

{i?(tfc, 7) : V/c G 1, 2 • • • n}: Retained adjacency matrices 
a, 13: attenuation factors 
eiunit vector (n x 1) 
Output 

7, Ai,n): Dynamic centrality vector 
Initialize 

ro ^ /3R{tn,j)e 

for i = 1 to n — 1 do 

ri ^ R{tn-i,j){l3e + ar^_l) 
DC{a, 7, Ai,^) ^ DC{a, 7, Ai,^) + r^ 
end for 



In addition to ranking nodes, dynamic centrality can be used 
to identify nodes that have the most influence on a given 
node over some period of time, or have been most influenced 
by it. For example, to flnd the node that is most influenced 
by i, we identify node j with the largest value of RCfj, given 
by Eq. M Similarly, RCfi gives the influence of node 7 on i 
and can be used to identify nodes that have had the most 
influence on i over some period of time. 

Tunable parameters a and 7 enable us to use dynamic cen- 
trality to study the structure of dynamic networks at differ- 
ent time and length scales. As described in Section [2] a sets 
the length scale of interactions. As a grows, longer paths 
become more important, and dynamic centrality takes into 
account increasingly larger network components. Parame- 
ter 7 sets the time scale of the interactions. For 7 = 0.0, 
only the most recent interactions are taken into account. As 
7 grows, older interactions are also considered. In the ex- 
treme case of perfect retention or memory, 7 = 1.0, every 
past interaction is remembered, similar to how a cumulative 
version of a dynamic network is constructed. 

We apply dynamic centrality to study the toy network shown 
in Fig.[T]^a). Figure [5] plots dynamic centrality score of each 
node, which is given by 7, Ai,4). Each plot shows 

results for a different value of 7, and each line in the plot 
corresponds to a different value of a from 0.0 to 1.0 in steps 
of 0.2 from the bottom. For 7 < 0.5 node 2 has the highest 
score for all values of a, and is therefore, highest ranked, al- 
though for a = 0.0, 7 = 0.0 node 3 has the same DC score as 
node 2. While both 2 and 3 have two outgoing edges, a larger 
number of longer paths originate from node 2 (2^3^4^5, 



2^4^5, 2^3^5) than node 3 (3^4^5, 3^5). In the case 
of perfect memory (7 = 1.0), node 2 is the highest ranked 
node for a < 0.4. As longer paths become more important 
at larger values of a, node I's influence grows and it becomes 
highest ranked. As the earliest node to send a message, it is 
the origin of the longest paths in the network. 

We compare dynamic centrality-based rankings with those 
produced by an equivalent static metric that computes the 
number of attenuated paths in an aggregate network shown 
in Fig. [ijb) regardless of the time the links were formed. 
To compute the static centrality score, we use C|(a) = 
^jCtj{a), where Cij{a) is given by Eq. [l] Figure [2|b) 
shows static centrality scores for cumulative network mat 
aggregate edges over time periods Ai,2, Ai,3, and Ai,4. The 
aggregate network corresponding to the period A 1,4 is shown 
in Fig. [T]^b). Static centrality leads to a radically different 
ranking. In the static networks that aggregate edges over pe- 
riods Ai,2 and Ai,3, node 1 is considered most influential, 
except for small values of a in the middle plot, when node 
2 becomes more influential. Because of cycles introduced at 
the last time step (by 5^2 edge), the static centrality scores 
computed for the network aggregated over the period At, 4 
(last plot in Fig. |2|b)) grow large with Node 2 is most 
important for all values of a, followed closely by nodes 1 
and 3. Surprisingly, node 5 is judged to be very influential, 
surpassing node 4 in score. This is obviously wrong, since 
only a single path of length one originates from node 5 in 
the dynamic network. 
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Figure 3: Influence of node 1 on others in the dy- 
namic vs static centrality formulations, (a) Dynamic 
influence of node 1 on others in the dynamic net- 
work for difl'erent values of 7 over time period Ai,4. 
(b) Static influence of node 1 on others in cumula- 
tive networks over time periods Ai,2? ^1,3 and Ai,4. 
Lines correspond to a — 0.0,0.2,0.4,0.6,0.8 and 1.0 re- 
spectively, from the bottom. 

In addition to ranking nodes, we can look at a given node's 
influence on other nodes in the network. Figure [3] shows 
the influence of node 1 computed using Eq.[l]and Eq. [5] for 
different values of a and 7. Again, the static and dynamic 
formulations lead to different views of influence. Dynamic 
centrality metric finds that node 1 has most influence on 
node 2, although as a and 7 increase, its influence on node 
4 grows to be comparable to its influence on node 2. This 

^We keep the first 10 terms in the sum in Eq. [l] This keeps 
from growing too large. 



is reasonable, because since node 1 is directly connected to 
2, we expect it to have most influence on that node. Node 
4 is connected to node 1 through nodes 2 and 3, and will 
also be highly influenced by it. Although node 5 is also 
linked to 1 by multiple paths, these paths are longer than 
those connecting node 1 to 3; therefore, node I's influence 
on 5 should be less than on 4. However, the static centrality 
metric applied to the aggregate network finds that node 1 
has biggest influence on node 5, followed by 4 and 2. Even 
when links are aggregated over a shorter period, Ai^a, node 
4 is most influenced by 1 at larger values of a. 

In summary, static and dynamic formulations of centrality 
lead to widely different views of importance in a dynamic 
network. We claim that by taking into account constraints 
on information flow imposed by the temporal ordering of 
edges, dynamic centrality formulation leads to a more accu- 
rate understanding of the structure of dynamic networks. 

4. CITATIONS NETWORK 

The citations data set consists of articles uploaded to the 
theoretical high energy physics (hep-th) section of the arXiv 
preprints server from 1993 to April, 20030 There are about 
28,000 articles with about 350,000 citations. Each article is 
identified by a unique number, with first two digits repre- 
senting the year of submission. Data was cleaned by remov- 
ing citations to articles that appeared in the future, as well 
as citations of the article to itself. 

We partition the data by year to construct snapshots of the 
dynamic network in consecutive years. The citations made 
by papers uploaded to arXiv during some year form the 
edges of the snapshot for that year. A year may not be 
an optimal partition of the data, since a small number of 
articles published in one year cite others published in the 
same year, but it is a convenient time scale to measure sci- 
entific production and interaction between researchers. We 
transpose the adjacency matrix to reverse direction of edges 
so that it represents the flow of influence from cited to cit- 
ing articles. Citations data can be alternately represented 
by a static network that aggregates all edges that appear 
over some time period, e.g., 1993-2003. Several researchers 
analyzed the structure of the static aggregate network, e.g., 
with PageRank algorithm, to identify influential articles [25| 
[g] 29, 26 . In contrast, we explicitly take the dynamic nature 
of the network into account. 

4.1 Parameter Estimation 

Dynamic centrality metric contains parameters a and 7. 
While varying their values turns dynamic centrality into a 
tool to study the structure of the network at different time 
and length scales, a natural question is what are the appro- 
priate values for these parameters? If we have enough data 
about the network, we can estimate them directly from the 
data. In this section we describe the methodology to esti- 
mate optimal values of a and 7 for the ArXiv data. 

To estimate a, we find the distribution of citation chains that 
span consecutive years. In other words, we set 7 = 0, so that 
no older citations are retained. Nj gives the total number 
of chains of length j that start in year t^-j+i and end in 

www.cs.cornell.edu / projects /kddcup / datasets.html 




Figure 4: Parameter estimation for the arXiv data 
set. (a) Distribution of the number of citations 
chains of different length with fit. (b) Distribution of 
the fraction of citations to papers published x years 
previously with fit. 



year tn- Assuming that the probability of picking a chain 
is proportional to the probability of transmitting a message 
along the chain, Nj decays geometrically with a. Therefore, 
the probability of choosing a citations chain of length j is 
given by a-^ . The expected number of citation chains is 
E(Nj) = aE{Nj-i). Figure[4|a) plots the distribution of the 
number of chains in the ArXiv data set that end in the year 
tn = 2002. This distribution is weU fit (with = 0.9999) 
by E{Nj) = c • 0.2289^ where c = 2.4606 x 10^^ This gives 
us a = 0.2289 for the arXiv data set. At this value of a, the 
mean path has length 1/(1 — a) — 1.3. This is consistent 
with the observation that citations chains have length ^ 2 
l29l|6l. 



To estimate 7, we assume that citation retention probability 
decays geometrically with time 25j. Let Cl be the number 
of papers at time j — k cited by papers at time j. Since the 
number of citations increases in time, we calculate W^. = 
^kl^k ^fc' fraction of papers appearing at time j — k 
that are cited by papers at time j. Taking the average of 
Wl for all J, gives the expected fraction of citations in a 
given paper to papers published k years before it, E{Wk)- 
Therefore according to our hypothesis, E{Wk) — ^E{Wk-i)- 
Figure |4jb) plots this distribution for papers in the arXiv 
data set. Data is well fit {R^ = 0.9992) by E{Wk) = d • 
(0.7722)^ where d = 0.36. Hence, 7 = 0.7722. 





Figure 5: Evolution of influence of three articles, (a) 
Dynamic centrality scores computed over a rolling 
three year window vs time, (b) Number of citations 
received by papers each year vs time. 

4.2 Influence of Individual Articles 

Dynamic centrality, Eq. [6] provides insights into evolution 
of scientific topics and influence of individual articles. Fig- 



ure [sja) shows how DC scores of three articles change in 
time. These articles were randomly chosen from among the 
articles ranked highest by PageRank. DC scores of the three 
articles were computed over a sliding three year window us- 
ing optimal parameters a — 0.2289 and 7 = 0.7722. This 
time window means that the longest citations chains DC will 
consider are of length three. Since there is evidence that re- 
searchers do not often follow citations links more than two 
levels deep [29] [6], a window of size three will adequately 
capture longer range interactions in this network. Evolution 
of article's centrality (Fig.jsja)) shows a similar trend to the 
number of new citations it receives each year (Fig.jsjb)). 

In addition to ranking articles, dynamic centrality allows us 
to directly measure the influence of one article on another. 
An article will often directly cite another that influenced it. 
At other times, however, we can trace the history of intellec- 
tual contribution through the chain of citations even in the 
absence of direct citation. The more citations chains link 
an article to a given article, the more influential the former 
will be. Table [1] lists the articles found to have the biggest 
influence on the three articles in figure. [5] Only a fraction of 
these articles are directly cited by the three target articles. 
Article 9409089 (by L. Susskind) deals with the relationship 
between string theory and black holes. This appears to be 
a highly specialized topic. Five of the ten articles found to 
have most influence on 9409089 were authored by Susskind 
and collaborators. Articles 9503124 (by E. Whitten) and 
9711200 (by J. Maldacena) deal with the more general topic 
of mathematics of string theory. There is significant overlap 
in the topics of these papers, as manifested by overlap in the 
influencing articles. Interestingly, five of the most influential 
articles (9207053, 9209016, 9402002, 9303057, 9304154) were 
authored by A. Sen, pointing to that authors importance in 
the field. Although we do not report it, it is interesting 
to see the papers that were most influenced by the target 
papers. All three target papers highly influenced articles 
on Supersymmetry, supergravity, holographic renormaliza- 
tion, and AdS/CFT correspondence. Articles 9503124 and 
9711200 also influenced papers dealing with "branes", a pop- 
ular subfield of string theory that emerged in the late 1990's. 

While it is difficult for a non-specialist to fully evaluate these 
results, they appear to be significant. It is highly unlikely 
the list of papers that highly influenced 9409089 would for- 
tuitously include so many papers dealing black holes and 
gravity. Likewise, non-existence of magnetic monopoles vio- 
lates electric-magnetic symmetry, or duality, which has ap- 
parently attracted much speculation by string theorists. Ap- 
pearance of so many papers dealing with these topics in the 
list of papers that influenced 9503124 and 9711200 cannot 
by coincidental. These observations give us confidence that 
dynamic centrality discovers significant relations in the data. 

4.3 Overall Influence and Ranking 

In addition to its usefulness in studying trends in citations 
data, we can also use dynamic centrality to compute the 
overall influence of articles over some period and rank them 
accordingly. This is a common task in bibliometric analysis. 
While many metrics have been developed to address this 
problem, most familiar ones are citations count and PageR- 
ank. Figure [6] shows Spearman's rank correlation coefficient 
between DC rankings and rankings based on total citations 
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Table 1: Ten articles that had the most influence on each of the three target articles computed at optimal 
a and 7. Cites column has "1" if the target article cites the listed article. Titles of target articles are: "The 
World as a Hologram" (9409089), "String Theory Dynamics In Various Dimensions" (9503124), and "The 
Large N Limit of Superconformal Field Theories and Supergravity" (9711200). 




Figure 6: Spearman's correlation between dynamic 
centrality-based rankings over the period 1993 — 
2000 and rankings based on articles' total citations 
count and PageRank over the same time period. 

count and PageRank. All metrics were computed for the pe- 
riod 1993-2000 inclusively. DC rankings are best correlated 
with total citations count for a = 0, 7 = 0. This is reason- 
able, since at these parameter values only direct edges (i.e., 
citations) contribute to DC. Correlation decreases with both 
a and 7, as longer paths and memory are taken into account. 
For a 1, 7 ^ 1 DC rankings are very different from those 
based on citations count. Correlation with PageRank]^ on 
the other hand, which was computed on the aggregate static 
network, is highest for a = 0, 7 = 1. Again, this is expected, 
since for these parameter values dynamic network resembles 
the static network. Correlation with PageRank is worst for 
Q/ = 1, ^ = 0, i.e., when paths of all length are taken into 
account and past citations are not retained. 

Table [2] lists ten articles with highest DC scores over the 
entire time period along with these articles total citations 
count and rank according to PageRank, also computed over 
the entire time period. The top- 10 list at a — 0.0 is rel- 
atively insensitive to the value of 7, with only two articles 
9908142 and 9906064 moving out of the top-10 position as 
7 1.0. For this value of a, DC takes number of citations 
into account only, and indeed the list contains articles with 
the highest citations counts, which are reported in column 

#c. 

In addition to direct citations, DC allows us to take longer 

^We used 0.1 as the probability of a random jump in our 
implementation of the PageRank algorithm. 



citations chains into account. Increasing a to 0.2 (which 
corresponds to average citations chain of length 1.25) dra- 
matically alters the rankings. Recent papers drop in rank- 
ings since not enough time had passed to create longer ci- 
tations chains to them. For example, article 9711200 that 
was ranked 1 moves to position 103. Other papers with far 
fewer citations, ^100, move to the top of the list. As 7 in- 
creases to it optimal value, three papers 9410167, 9510017, 
and 9510135 are replaced in the top-10 list by three new pa- 
pers (9209016, 9208055, 9303057). Remarkably, two of them 
are by the same author, A. Sen. 

In summary dynamic centrality leads to a completely dif- 
ferent view of importance than citations count and PageR- 
ank. Only nine of the 20 articles rated highest by PageR- 
ank appear among the top-20 articles rated highest by DC 
(using optimal parameter values). Another striking differ- 
ence is that Edward Witten authored five of the 20 articles 
ranked highest by PageRank, while Ashoke Sen authored 
four. Among the 20 articles rated highest by DC, Ashoke 
Sen appears as an author seven times and Ed Whitten two 
times. While Sen may not be as famous as Whitten, he is a 
major figure in string theory, who had a remarkable ability 
to write prescient papers 28 . He is also a prolific author, 
fifth most productive one in the arXiv data set. Dynamic 
centrality is able to discover "hidden gems" by this influential 
physicist which are overlooked by other metrics. 

5. CONCLUSION 

We have presented a novel formulation of centrality for dy- 
namic networks that measures the number of paths that ex- 
ist over time in a network. Given snapshots of the network 
at different times showing the connected nodes, we can cal- 
culate dynamic centrality and use this metric to rank nodes 
by how well connected they are over time to the rest of the 
network. In addition, we can identify nodes that are best 
connected to, and therefore, exert most influence on, a given 
node. We can also vary the time and length scale parameters 
to identify nodes that are globally or locally connected. 

Dynamic centrality gives a different view of importance in a 
network than other measures, such as static centrality and 
PageRank. We illustrated the differences on an example net- 
work. In addition, we applied dynamic centrality to study 
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Table 2: List of articles with highest total DC scores for a = and a = 0.2 along with their number of citations 
(#C) and PageRank (PR) rank. 



scientific papers citations network. Even though this data 
set has been extensively studied in the past, we were able to 
discover interesting new facts, including an influential arti- 
cles that were overlooked by other approaches. 

Citations networks are limited in their dynamics, since edges 
can only appear, but never disappear. We plan to apply our 
approach to more general dynamic networks. 
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